Towards Semantic Microaggregation of Categorical Data for Confidential Documents

نویسندگان

  • Daniel Abril
  • Guillermo Navarro-Arribas
  • Vicenç Torra
چکیده

In the data privacy context, specifically, in statistical disclosure control techniques, microaggregation is a well-known microdata protection method, ensuring the confidentiality of each individual. In this paper, we propose a new approach of microaggregation to deal with semantic sets of categorical data, like text documents. This method relies on the WordNet framework that provides complete semantic relationship taxonomy between words. Therefore, this extension aims ensure the confidentiality of text documents, but at the same time, it should preserve the general meaning. We apply some measures to evaluate the quality of the protection method relying on information loss. URL http://www.springerlink.com/content/f41402862155w6t4/ [16] Source URL: https://www.iiia.csic.es/en/node/54964 Links [1] https://www.iiia.csic.es/en/staff/daniel-abril [2] https://www.iiia.csic.es/en/staff/guillermo-navarro-arribas [3] https://www.iiia.csic.es/en/staff/vicen%C3%A7-torra [4] https://www.iiia.csic.es/en/bibliography?f[author]=2516 [5] https://www.iiia.csic.es/en/bibliography?f[keyword]=940 [6] https://www.iiia.csic.es/en/bibliography?f[keyword]=497 [7] https://www.iiia.csic.es/en/bibliography?f[keyword]=941 [8] https://www.iiia.csic.es/en/bibliography?f[keyword]=447 [9] https://www.iiia.csic.es/en/bibliography?f[keyword]=944 [10] https://www.iiia.csic.es/en/bibliography?f[keyword]=942 [11] https://www.iiia.csic.es/en/bibliography?f[keyword]=932 [12] https://www.iiia.csic.es/en/bibliography?f[keyword]=930 [13] https://www.iiia.csic.es/en/bibliography?f[keyword]=943 [14] https://www.iiia.csic.es/en/bibliography?f[keyword]=945 [15] https://www.iiia.csic.es/en/bibliography?f[keyword]=939 [16] http://www.springerlink.com/content/f41402862155w6t4/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Semantic Microaggregation of Categorical Data for Confidential Documents

In the data privacy context, specifically, in statistical disclosure control techniques, microaggregation is a well-known microdata protection method, ensuring the confidentiality of each individual. In this paper, we propose a new approach of microaggregation to deal with semantic sets of categorical data, like text documents. This method relies on the WordNet framework that provides complete ...

متن کامل

Scientific papers on semantics and aggregation procedures for SDC of qualitative variables

Microaggregation is a masking procedure used for protecting confidential data prior to their public release. This technique, that relies on clustering and aggregation techniques, is solely used for numerical data. In this work we introduce a microaggregation procedure for categorical variables. We describe the new masking method and we analyse the results it obtains according to some indices fo...

متن کامل

Towards a private vector space model for confidential documents

We introduce in this paper a method to anonymize document vector spaces. These vector spaces can be used to analyze confidential documents without disclosing private information. The method is inspired in microaggregation, a popular technique used in statistical disclosure control. URL http://doi.acm.org/10.1145/2480362.2480543 [9] Source URL: https://www.iiia.csic.es/en/node/54488 Links [1] ht...

متن کامل

Semantic adaptive microaggregation of categorical microdata

In the context of Statistical Disclosure Control, microaggregation is a privacy preserving method aimed to mask sensitive microdata prior to publication. It iteratively creates clusters of, at least, k elements, and replaces them by their prototype so that they become k-indistinguishable (anonymous). This data transformation produces a loss of information with regards to the original dataset wh...

متن کامل

Spherical microaggregation: Anonymizing sparse vector spaces

Abstract Unstructured texts are a very popular data type and still widely unexplored in the privacy preserving data mining field. We consider the problem of providing public information about a set of confidential documents. To that end we have developed a method to protect a Vector Space Model (VSM), to make it public even if the documents it represents are private. This method is inspired by ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010